A hybrid approach to compounds in LVCSR
نویسندگان
چکیده
In several languages compound words form orthographic units, which complicates the task of ensuring good lexical coverage for large vocabulary continuous speech recognition (LVCSR). A common approach to the problem consists of first recognizing the compound constituents, followed by an automatic recompounding process. We describe an accurate compound module, which combines a rule-based approach with statistical pruning. The module is incorporated in a broadcast news recognition task for Dutch and yields an 11% relative decrease in word error rate (WER).
منابع مشابه
A hybrid language model for open-vocabulary Thai LVCSR
This paper investigates the use of a hybrid language model for open-vocabulary Thai LVCSR. Thai text is written without word boundary markers and the definition of word unit is often ambiguous due to the presence of compound words. Hence, to build open-vocabulary LVCSR, a very large lexicon is required to also handle word unit ambiguity. Pseudomorpheme (PM), a syllable-like sub-word unit specif...
متن کاملInvestigation of Maximum Entropy Hybrid Language Models for Open Vocabulary German and Polish LVCSR
For languages like German and Polish, higher numbers of word inflections lead to high out-of-vocabulary (OOV) rates and high language model (LM) perplexities. Thus, one of the main challenges in large vocabulary continuous speech recognition (LVCSR) is recognizing an open vocabulary. In this paper, we investigate the use of mixed type of sub-word units in the same recognition lexicon. Namely, m...
متن کاملEfficient search using posterior phone probability estimates
In this paper we present a novel, efficient search strategy for large vocabulary continuous speech recognition (LVCSR). The search algorithm, based on stack decoding, uses posterior phone probability estimates to substantially increase its efficiency with minimal effect on accuracy. In particular, the search space is dramatically reduced by phone deactivation pruning where phones with a small l...
متن کاملTied posteriors: an approach for effective introduction of context dependency in hybrid NN/HMM LVCSR
This papers presents a method to improve the recognition rate of hybrid connectionist/HMM speech recognition systems. At the same time this approach allows the easy introduction of context dependent models in the hybrid framework. The approach is based on a standard hybrid connectionist/HMM recognizer, in which the neural nets are trained to estimate the a posteriori probabilities for all phone...
متن کاملCombining multiple-type input units using recurrent neural network for LVCSR language modeling
In this paper, we investigate the use of a Recurrent Neural Network (RNN) in combining hybrid input types, namely word and pseudo-morpheme (PM) for Thai LVCSR language modeling. Similar to other neural network frameworks, there is no restriction on RNN input types. To exploit this advantage, the input vector of a proposed hybrid RNN language model (RNNLM) is a concatenated vector of word and PM...
متن کامل